On the Benefits of Sampling in Privacy Preserving Statistical Analysis on Distributed Databases

نویسندگان

  • Bing-Rong Lin
  • Ye Wang
  • Shantanu Rane
چکیده

We consider a problem where mutually untrusting curators possess portions of a vertically partitioned database containing information about a set of individuals. The goal is to enable an authorized party to obtain aggregate (statistical) information from the database while protecting the privacy of the individuals, which we formalize using Differential Privacy. This process can be facilitated by an untrusted server that provides storage and processing services but should not learn anything about the database. This work describes a data release mechanism that employs Post Randomization (PRAM), encryption and random sampling to maintain privacy, while allowing the authorized party to conduct an accurate statistical analysis of the data. Encryption ensures that the storage server obtains no information about the database, while PRAM and sampling ensures individual privacy is maintained against the authorized party. We characterize how much the composition of random sampling with PRAM increases the differential privacy of system compared to using PRAM alone. We also analyze the statistical utility of our system, by bounding the estimation error — the expected `2-norm error between the true empirical distribution and the estimated distribution — as a function of the number of samples, PRAM noise, and other system parameters. Our analysis shows a tradeoff between increasing PRAM noise versus decreasing the number of samples to maintain a desired level of privacy, and we determine the optimal number of samples that balances this tradeoff and maximizes the utility. In experimental simulations with the UCI “Adult Data Set” and with synthetically generated data, we confirm that the theoretically predicted optimal number of samples indeed achieves close to the minimal empirical error, and that our analytical error bounds match well with the empirical results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Performance Analysis of Privacy Preserving Naïve Bayes Classifiers for Distributed Databases

The problem of secure and fast distributed classification is an important one. The main focus of the paper is on privacy preserving distributed classification rule mining. This research paper addresses the performance analysis of privacy preserving Naïve Bayes classifiers for horizontal and vertical partitioned databases. The Naïve Bayes classifier is a simple but efficient baseline classifier....

متن کامل

Privacy Preserving Linear Regression on Distributed Databases

Studies that combine data from multiple sources can tremendously improve the outcome of the statistical analysis. However, combining data from these various sources for analysis poses privacy risks. A number of protocols have been proposed in the literature to address the privacy concerns; however they do not fully deliver on either privacy or complexity. In this paper, we present a (theoretica...

متن کامل

Differentially Private Local Electricity Markets

Privacy-preserving electricity markets have a key role in steering customers towards participation in local electricity markets by guarantying to protect their sensitive information. Moreover, these markets make it possible to statically release and share the market outputs for social good. This paper aims to design a market for local energy communities by implementing Differential Privacy (DP)...

متن کامل

Privacy and Security of Big Data in THE Cloud

Big data has been arising a growing interest in both scien- tific and industrial fields for its potential value. However, before employing big data technology into massive appli- cations, a basic but also principle topic should be investigated: security and privacy. One of the biggest concerns of big data is privacy. However, the study on big data privacy is still at a very early stage. Many or...

متن کامل

A centralized privacy-preserving framework for online social networks

There are some critical privacy concerns in the current online social networks (OSNs). Users' information is disclosed to different entities that they were not supposed to access. Furthermore, the notion of friendship is inadequate in OSNs since the degree of social relationships between users dynamically changes over the time. Additionally, users may define similar privacy settings for their f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1304.4613  شماره 

صفحات  -

تاریخ انتشار 2013